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ABSTRACT 


The exhaustion of available public IPv4 addresses has had a significant impact in an ex¬ 
panding, networked world and has led to increased adoption of IPv6. As IPv6 becomes 
more commonplace, it permits abusive and malicious parties to exploit both new and exist¬ 
ing vulnerabilities. Among such vulnerabilities is abusive electronic messaging, or spam. 
To better understand the impact of spam utilizing IPv6 as its delivery protocol, this study 
focused on both real-world IPv6 spam collected from large production domain and IPv6 
spam laboratory measurements. This study used various network traffic analysis tools to 
detect, classify, and associate IPv6 spamming behavior, both at the victim mail exchanger 
and among IPv6 wide-area routes. Furthermore, popular mail transfer agents were tested 
in an effort to profile their IPv6 behavior and correlate with spam obtained from the real 
world production domain. Results show that while IPv6 spamming behavior is growing, 
it is still in its infancy and no outstanding characteristics emerged that allow for definitive 
classification as a dominant IPv6 spamming behavior. 
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CHAPTER 1: 
Introduction 


An Internet Protocol (IP) address is a distinct 32 bit or 128 bit unsigned integer that is 
assigned to network interfaces connected to the Internet. The Internet Assigned Numbers 
Authority (IANA) is responsible for assigning IP address blocks to regional Regional Inter¬ 
net Registries (RIR), which, in turn, assign those blocks to conglomerates and individuals 
based on geographic localities. An IP address is either an Internet Protocol version 4 (IPv4) 
address (32-bit) or an Internet Protocol version 6 (IPv6) address (128 bit) and is used to 
route data between devices connected to the Internet. There are about 4 billion possible IP 
addresses within the 32-bit IPv4 address space, but because contiguous addressing is re¬ 
quired for routing requirements, the actable number of usable IPv4 addresses for end-hosts 
is much smaller. In 2011, IANA’s available IPv4 address pool officially depleted [1], [2] 
and, as of this writing, some RIRs have only a small number of available address blocks 
remaining. With this exhaustion of IPv4 addresses, more and more organizations and indi¬ 
viduals are adopting IPv6 [3]. 

IPv6 was created in 1998 in anticipation of the exhaustion of IPv4 addresses, but it was 
not initially widely adopted due to an increased complexity in addressing, cost associated 
with equipment upgrades, management cost, and the availability of interim solutions such 
as Network Address Translation (NAT) [4]. IPv6 is formally defined in Request For Com¬ 
ments (RFC) 2460. IPv6 is not specifically interoperable with IPv4, and essentially acts 
as a parallel, independent network of IPv4 [5]. To facilitate network traffic between the 
differing protocols, an IPv6 network can use a translation technology such as 6to4, 6in4, or 
Teredo tunneling protocol that allows IPv6 traffic to be encapsulated and routed over IPv4 
networks. Today, major network service providers, equipment manufacturers, and the U.S. 
government utilize IPv6 in both their operating systems and network switching devices in 
order to take advantage of IPv6 and its addressing capability [6], [7]. 

Due to the rise in IPv6 adoption, it is natural to expect a corresponding rise in malicious 
traffic that is routed using IPv6. Using IPv6 has the potential to afford attackers and mali¬ 
cious traffic several advantages. Since IPv6 has less usage in a production environment than 
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IPv4, firewall configurations detecting malicious IPv6 traffic are not as well documented, 
configured, and deployed as their IPv4 counterparts, intrusion detection systems (IDSs) 
rely on expected traffic profiles to catch anomalous or malicious behaviors, most of which 
use profiles that have little to no exposure as to how a normal IPv6 traffic profile presents. 
Another vulnerability in IPv6 that does not exist in IPv4 lies in how Internet Control Mes¬ 
sage Protocol version 6 (ICMPv6) is used. Firewalls must permit ICMPv6 traffic in order to 
allow IPv6 to operate correctly. IPv4 firewalls traditionally block Internet Control Message 
Protocol (ICMP) traffic due to widely known vulnerabilities, but ICMPv6 messages must 
be allowed through firewalls for IPv6 services to function correctly, even though without 
careful configuration ICMPv6 messages can be easily used to conduct denial of service 
attacks (DoSs) or profile networks [8]. Since some systems use IPv4-IPv6 tunneling tech¬ 
nologies, it does not take a great deal of effort for a malicious entity to inject malicious 
traffic if they know which routers are being used to tunnel IPv6 traffic over an IPv4 net¬ 
work. Until such a time that IPv6 is utilized and studied as much as IPv4, more and more 
vulnerabilities will be discovered and exploited in IPv6, making it more advantageous for 
a nefarious user to use IPv6 in the early stages of its deployment [9]. 

The goal of this thesis is to have a better understanding of how one form of malicious and 
abusive traffic, spam, flows through the Internet with IPv6. Due to this anticipated rise in 
abusive IPv6 traffic, we endeavored to further the study of spam in IPv6 by examining col¬ 
lected spam traffic from a public facing mail server, creating a laboratory testbed to mimic 
IPv6 spamming behavior, and correlating spam data to a Border Gateway Protocol (BGP) 
dataset to analyze spamming behavior in IPv6. This study seeks to determine any classifi¬ 
able behaviors that can aid malicious traffic mitigation techniques. Ultimately, new metrics 
could be used to classify spam over IPv6 based on traffic characteristics alone, resulting 
in less complex filtering at endpoint Mail Exchangers (MXs), a denial of spam routing be¬ 
tween autonomous systems (ASs), and a more concrete understanding of malicious traffic 
techniques in IPv6 [10]. 

1.1 Motivation 

The adoption of IPv6 is experiencing an increasing trend, 10 fold since 2008, which re¬ 
plenishes usable IP addresses for networked devices but also allows abusive and malicious 
parties to exploit new and existing vulnerabilities [3]. 
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Figure 1.1: Google User IPv6 Adoption Statistics, from June 2014 [3] 


For example, as a result of expanding use of IPv6, it is hypothesized that a significant 
amount of spam will eventually utilize IPv6 as its delivery protocol in order to take advan¬ 
tage of the exploitability of an increased address space. Such an increased address space 
affords spammers the ability to use a different source address for each spam message sent, 
which could allow the spammer to evade spam filters based on IP addressing and reputa¬ 
tional data. Additionally, the IPv6 address space is so incredibly large that it is highly likely 
that a single device has multiple IPv6 addresses. This capability will allow an attacker with 
a single device to use the different interfaces to send multiple instances of malicious traffic, 
which would minimize the attackers footprint from a single source perspective and reduce 
the likelihood of malicious traffic attribution. Due to this anticipated rise in abusive IPv6 
traffic, more studies need to be conducted into the behavior of spam in IPv6 [11]. While 
this thesis focuses on collected, real-world IPv6 spam analysis from a large production 
IPv6 domain, the lessons learned offer more general guidance for understanding the use 
and evolution of various kinds of malicious IPv6 traffic. If a concrete metric is established 
of spam using IPv6, numerous organizations, including the U.S. government, could benefit 
from a more robust method of discovering, filtering, and defeating malicious IPv6 traffic. 
Today, a plethora of work has been focused on spam behavior and its filtering in IPv4. 
Previous work on IPv6 spamming behavior has been largely theoretical and lacked any 
true measurement of IPv6 spamming behavior observed “in the wild.” Thus, we attempt to 
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observe and classify spam data and associate known spammer techniques used in IPv4 to 
those used in IPv6. 


1.2 Research Questions 

This thesis relies on collected IPv6 spam data from an enterprise level production domain. 
Using captured IPv6 spam packets from our production domain, experimental data col¬ 
lected from our laboratory spamming testbed, and BGP routing updates, we seek to char¬ 
acterize abusive IPv6 traffic. In doing so, we explore the following: 

• Does a classifiable relationship exist between IPv6 spam and BGP routing behaviors? 

• Is the larger IPv6 address space a significant resource for a spammer? 

• If attackers exploited the large IPv6 address space and constantly changed their IP 
address would the behavior be detectable? 

• Does a set of spam behaviors or metrics exist that would allow for spam to Mail 
Transfer Agent (MTA) correlation? 

• Does a discernible set of spamming characteristics exist in IPv6 traffic that would 
allow an effective spam filter to be built? 

1.3 Contributions 

Our research efforts in malicious IPv6 traffic analysis yielded the following findings: 

• IPv6 spam is two orders of magnitude less prevalent than IPv4 spam as measured in 
our example. com dataset. 

• Although standards are clearly defined in various RFCs, network configurations, 
Operating Systems (OSs), and MTAs do not always follow most preferred network 
configuration preferences, making default behaviors difficult to identify. 

• BGP spectrum agility is present and can be measured in IPv6 spam, but does not 
exactly mirror previously observed spectrum agility methods in IPv4. 

• We did not discover any behaviors that would suggest that malicious IPv6 traffic is 
exploiting the vast IPv6 address space. 
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1.4 Thesis Structure 

The remainder of this thesis is organized as follows: 

• Chapter 2 discusses spam behavior and filtering, BGP routing, malicious traffic, and 
related IPv6 malicious traffic studies. 

• Chapter 3 focuses on our production domain experimental setup, our laboratory spam 
replay domain, and introduces our efforts to correlate BGP routing behavior with 
IPv6 spam. 

• Chapter 4 provides all the results from our laboratory spamming testbed analysis, 
the correlation of collected spam data to University of Oregon Route Views Project 
(RouteViews) multithreaded routing toolkit (MRT) updates, and inferred IPv6 spam 
classification characteristics. 

• Chapter 5 details thesis research conclusions and recommendations for future re¬ 
search areas related to this work. 
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CHAPTER 2: 
Technical Review 


The deployment and use of IPv6 has steadily increased since its inception in 2008, moti¬ 
vated in part by IPv4 address depletion, economic incentives for large-scale organizations, 
and promotional efforts [2], [5], [6], [12]—[14]. Events such as these have presented the 
possibility to understand, characterize, and protect the IPv6 Internet. There has been a 
significant body of work that characterizes IPv4 traffic, especially when discussing the be¬ 
haviors and characteristics of spam traffic. Spam traffic is considered network traffic that 
can be classified as unsolicited bulk email. Oftentimes, a single source with resources will 
send out a mass volume of advertisement emails to incredibly large email lists that are pub¬ 
lic available for less than a few hundred dollars. A spamming bulk mailer requires a small 
amount of start-up cost and investment, which is well worth the effort if a spammer makes 
even a fraction of the sales of the products contained within the spamming messages sent 
each day [15]. While spam can be an excessive annoyance of poorly crafted advertising 
emails, they can sometimes contain attachments or links to scam offers, phishing attempts, 
or malicious code. Even though spam can come in many different forms, the typical moti¬ 
vation behind a spammer’s efforts is financial gain, either through product purchasing or the 
compromise of personally identifiable information [16]. Unfortunately, much less attention 
has been applied to spam traffic on IPv6. The immense IPv6 address space and protocol 
differences create new vulnerabilities for exploitation and may cause a resurgence of previ¬ 
ously solved security flaws. This chapter will review and discuss the nature of spam, how 
spam permeates the Internet, and previous efforts to research and classify malicious traffic. 


2.1 Malicious Traffic 

Malicious or abusive Internet traffic comes in many forms and has a variety of characteris¬ 
tics. A commonly recognized form of malicious traffic is spam. In order to properly classify 
malicious traffic, it is import to understand how it operates and why it is so pervasive. A 
spammer can send a vast volume of abusive traffic simply because spam can be easily auto¬ 
mated, requires little management overhead, and is hard to attribute to a specific person. To 
focus on how spamming actually happens, one must look to how the Simple Mail Transfer 
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Protocol (SMTP) is exploited by a malicious actor. A legitimate email message is routed 
via SMTP from a sending MTA to a receiving MTA. While spam messages can be routed 
like legitimate email, spammers often take advantage of poorly configured or compromised 
networked machines, allowing the spammer to spam the receiving MTA directly without 
relaying through a designated MTA. By circumventing the designated MTA, spammers are 
also able to evade the spam filtration mechanisms deployed at the designated MTA. Spam 
messages can come in a variety of types, from general advertisements to specifically crafted 
messages at a target host machine. This wide range of malicious message variety makes it 
difficult to conduct message content filtering without knowing what the message is going to 
be before hand. As a result, SMTP abuse is mitigated IPv4 through content filters, firewall 
configuration, and extensive use of Domain Name System blacklists (DNSBLs), which use 
mail filtering lists based on known spamming hosts and IP addresses. Regrettably, anal¬ 
ogous abusive traffic solutions for IPv6 spam are not as common nor well-tested. Since 
mitigation strategies in regard to IPv6 abusive traffic are limited, it is envisioned that IPv6 
will have an expanding role as an exploitation vector. In many cases, so long as an IPv6 
traffic route exists, IPv6 traffic independence from IPv4 allows for different traffic paths 
and network policies that may be misconfigured or less securely due to little experience or 
use, such as those in Figure 2.1. 

Specifically, a firewall might block outbound Transmission Control Protocol (TCP) port 25 
traffic on IPv4, but not on IPv6. A great deal of research effort is needed to discover and 
repair the numerous attack vectors that exist within IPv6, hopefully, prior to discovery via 
a wide-scale victim exploitation event. 

2.2 BGP Routing 

A significant portion of our experiment relies upon examining BGP routing to determine if 
IPv6 spammers are exploiting BGP and whether they can be detected through BGP obser¬ 
vations. Therefore, a brief look into BGP behavior is needed. In order for data to traverse 
from one end point to another, BGP is the preferred inter-domain protocol to route traffic 
between disassociated networks. BGP relies upon network boundaries, known as ASs, to 
exchange global traffic. Every logical part of the Internet exists within an AS, but each 
AS has various policies and preferred connections to other ASs that are determined by net¬ 
working, economics, and politics. In order for ASs to communicate and advertise their 


8 




services, BGP exchanges reachability information via Open and Update messages. Of spe¬ 
cific importance is the Update message, as it allows BGP peers to send enough routing 
information to establish a graph of relationships between ASs in order to select the correct 
path for routing traffic [18]. 

To further study the behaviors of BGP, the Advanced Network Technology Center at the 
University of Oregon has created the Route Views project. With Route Views, any user 
is able to view historical routing information from BGP Update messages. Route Views 
servers peer directly with other BGP routers to record BGP Update messages and store 
them into MRT formatted routing information base (RIB) files [19]. The Route Views data 
repository effectively allows a user to return to a desired date and time and “replay” BGP 
routing behavior. The BGP replay ability afforded by Route Views has been critical to the 
study of malicious traffic analysis by allowing the correlation of malicious traffic to unique 
BGP behaviors [20]. 
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2.3 Related IPv6 Malicious Traffic Studies 

In order to discover malicious traffic solutions in IPv6, it is vital to study and acknowl¬ 
edge previous areas of research that contain elements of possible answers to our research 
questions. For this study, we researched two key areas: spam and IPv6. 

2.3.1 Spam 

While there have been numerous spam-related studies, one study in particular is most rel¬ 
evant to our present research. Ramachandran and Feamster [20] studied several behaviors 
of network-level spammers and were able to successfully identify behaviors that aided in 
combating spam. Rachachandran et al. found that IPv4 sources of spam have a differ¬ 
ent distribution as compared to the sources of legitimate mail, that a small number of ASs 
account for nearly 40% of all of their measured spam, that most spam originated from Win¬ 
dows hosts, and that spammers used a technique known as BGP spectrum agility to remain 
untraceable [20]. Rachachandran’s et al. lessons learned as a result of their studies consist 
of fully identifying the spam host for better spam filtering, using aggregate data to identify 
nefarious behavior, securing the Internet routing infrastructure, and combining network- 
level properties into trusted spam filters to aid in efforts to mitigate spam. Unfortunately, 
all of Rachachandran’s et al. work was only conducted over IPv4 spam, which limits its 
application to IPv6 spam mitigation techniques. In contrast, this study seeks to perform 
several similar analyses to observed IPv6 spam traffic. 

2.3.2 IPv6 Prevalence and Challenges 

Until recently, there has been only a small amount of IPv6 traffic on the Internet. Dhamd- 
here et al. used BGP data to analyze the growth and performance of IPv6 on the Inter¬ 
net [13]. Dhamdhere et al. found that IPv6 is slowly growing each year, is more prevalent 
in Europe and the Asia-Pacific region, and that IPv6 performance measurements are com¬ 
parable to IPv4 performance measurements as long as the AS-level paths are the same. 
When the AS-level paths differ, however, IPv6 performance can suffer in a drastic way. 
World IPv6 launch was regarded with optimism in the hopes that IPv6 awareness would 
bring in a large, new support for IPv6. While World IPv6 launch did have an impact, it was 
not as influential as desired [14] and more IPv6 topology and routing data are needed for 
comparison to IPv4 data. 
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2.3.3 IPv6 Client Adoption 

Even though IPv4 addresses have been exhausted, a rapid adoption remains to be seen in 
wide-scale measurements of IPv6 clients. Zander et al. sought to quantify exactly how 
many clients connected to the Internet are using IPv6 [21]. Using Google ads to deliver 
a custom IPv6 capability test to web clients over a period of 10 months. Zander et al. 
established a dataset that demonstrated that while IPv6 capable Domain Name System 
(DNS) servers have increased 60% since IPv6-day 2011, client IPv6 adoption remains at a 
very low rate [21]. Even though the IPv6 client adoption rate remains generally low, more 
and more clients are still adopting IPv6, which would also allow native client resources 
to utili z e IPv6. This is an important point, because if a client is IPv6 capable and infected 
with malware, that malware will most likely have an IPv6 communications capability. With 
the less-than-wide-scale adoption and the limited security strategies implemented in IPv6, 
a spammer sending malware or controlling a spamming botnet in IPv6 would most likely 
have more success since anti-spam efforts in IPv6 are still under significant development 

[HI 

2.3.4 IPv6 and Spam 

As they investigated why IPv6 was not being rapidly adopted as the preferred IP for MXs, 
Kosik et al. hypothesized that the persistence and sophistication of spam reduces the incen¬ 
tive of deploying IPv6 within a network [11]. Since the effectiveness of a DNSBL relies 
upon direct addressing and the reputation of a mail sender, Kosik et al. believed that the 
massive IPv6 address space would erode the DNSBL advantage. According to Kosik et 
al., IPv6 DNSBLs would need to use a whitelisting method of filtration. That is, only IPv6 
addresses known to be reputable are allowed through the mail filter. Due to the vast supply 
of IPv6 addresses, the benefits of caching DNSBL lookups become negligible, leading to 
an added latency on the side of the MTA, computation on the DNSBL host, and bandwidth 
to both. Otherwise a spammer can use a different IPv6 address with each spam and it is not 
computationally feasible to check every single possible IPv6 address against a DNSBL. 

While IPv6 whitelisting may seem effective, it would discourage IPv6 connectivity across 
ASs and exhibit a tedious burden for the system administrator. As a result of IPv6 DNSBL 
problem complexity and the exploitation vectors afforded to spammers in IPv6, the more 
expensive, less effectual content-based spam filtering solution would be needed on IPv6 
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mail servers. The cost alone in employing content based MTA spam filtration could dis¬ 
suade any organization from making the migration from IPv4 to IPv6. Kosik el al. had 
some well thought out ideas in regards to the possible effects of spam on IPv6, but few 
measurements were made within their study to substantiate their claims. 

Another piece of research that sought to identify anti-spam approaches in IPv6 was the 
work by Rafiee el al. [22]. This research contains a great deal of background information 
on spam and the key characteristics of IPv6, while presenting some stimulating ideas on the 
security implications of spam in IPv6. For example, Rafiee et al. suggest that a spammer 
could use Bayesian poisoning techniques coupled with DNS DoS to bypass spam filtering 
in IPv6 networks. Rafiee et al. also identify the prodigious IPv6 address space as a spam¬ 
mer’s resource and suggest that the IPv6 network prefix can be whitelisted in an internal 
network and that the Internet could use some sort of router prefix blacklisting method. The 
spam prevention methods discussed in this research have limited contributions, as some 
countries’ Internet Service Provider (ISP)’s policies require periodic changes to routing 
prefixes, which has the second order effect of needing to also change router prefix black¬ 
lists. 
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CHAPTER 3: 
Methodology 


A common technique used to measure spamming behavior usually involves the use of a sac¬ 
rificial target domain that entices spammers to send spam in order to conduct the desired 
experimental measurements. This type of target domain is often referred to as a “honey- 
pot" or “honey domain,” and it creates an appearance of a legitimate, exploitable target. 
While this is an effective experimental method when observing spamming behavior, there 
is little or no legitimate mail activity. The lack of legitimate email accounts often causes 
difficulty in inducing spammers to target the honeypot. To maximize the effectiveness of 
our experiment and further the study of real-world IPv6 deployment, we conducted our 
measurements using a production domain that is in active use. By using a live, corporate 
production domain with several thousand users, we were able to study non-random unso¬ 
licited mail activity over an extended period of time. This chapter will briefly discuss the 
experimental architecture used on our corporate production domain, which we will here¬ 
after refer to as “example, com,” but will also focus on our laboratory spamming testbed 
and our BGP correlation algorithm. 

In addition to observing and collecting spam from a real-world production domain, we cre¬ 
ated a laboratory testbed environment that mimics the production domain’s configuration. 
This testbed allows us to produce a comparative dataset and will hereafter refer to it as 
“test. com.” The intent of our laboratory testbed is to re-create observed spamming be¬ 
havior using various MTAs in order to fingerprint default MTA behavior that can be applied 
to metrics and used for analysis. While this MTA categorization is by no means all encom¬ 
passing, it does provide some insight into default MTA behaviors and provides enough 
information to use as a MTA classification tool that can be applied to our example. com 
measurements. 

The final piece of our experimental methodology relies on our BGP correlation algorithm, 
which replays our collected spam dataset from example, com against BGP Update mes¬ 
sages in order to determine the state of the BGP routing table at the time the spam message 
was sent. By using the archive of BGP messages, we can determine the age of the BGP pre- 
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fixes corresponding to IPv6 spam origins to determine if a measurable relationship exists. 
Similar to Rachachandran et al., we sought to identify whether spammers use BGP IPv6 
spectrum agility, which is an obfuscation technique that consists of briefly announcing IP 
address space, usually hijacked IP addresses, from which to send spam and the routes to 
that IP address space once the spam has been sent [20]. 


3.1 Experimental Production Domain 

The spamming measurements conducted on our live production domain constitute robust, 
initial efforts of our experiment. While a great deal of configuration, experimentation, and 
analysis has been conducted on our example.com measurements, that work was largely 
performed by our research colleagues. Instead, the focus of this thesis relies on the setup 
and validation of our laboratory spamming testbed and tying in our example, com results 
with known BGP spamming trends. However, we review the configuration of the exper¬ 
imental production domain here to provide the relevant background on our subsequent 
analysis. 


3.1.1 Configuring example. com 

The first step in designing the experiment was selecting a protocol that would enable mea¬ 
surement of abusive IPv6 traffic within a production domain. For our purposes, we selected 
SMTP because it has built-in failover capability to facilitate using IPv6 when IPv4 connec¬ 
tions are deemed abusive, is highly configurable, and is a known choice for spamming 
activity [23]. Next, we created a dual-stack, tertiary MX server to act as our spam sensor. 
With the sensor in place within our production domain, we created configuration parame¬ 
ters that would still allow legitimate mail flows through our primary (smtpl) and secondary 
(smtp2) MXs while directing known spammers or less favorable mail flows to our spam 
sensor, as demonstrated in Figure 3.1. 

Further configuration metrics regarding example.com will be explained in Section 3.2, 
which had a purpose of mimicking the spamming behaviors seen within our large scale 
production domain. 
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3.1.2 example. com Dataset 

To properly record all measured spam events from example. com, we collected packet cap¬ 
ture file (pcap) traces of all incoming TCP connection attempts, both IPv4 and IPv6, over 
port 25 at the sensor MTA for nearly 11 months, January through November 2013. Each 
pcap included timestamps, IP, and TCP information of spamming connections, in addition 
to using the following methods to enumerate each spam attempt: 

• Autonomous System Number (ASN) for the source IP, using Team Cymru’s IP-to- 
ASN lookup [24] (for 6to4 IPv6 addresses [25] we used the ASN associated with the 
IPv4 address of the 6to4 gateway, which is embedded in the IPv4 address). 

• Reverse Domain Name System (RDNS) name for the source IP, obtained by perform¬ 
ing a PTR lookup on the IP. 1 

• OS version and flavor of the sending MTA, as interpreted by passive operating system 
fingerprint (pOf) [26]. 

• Reputation data for the sending IPv4 address from two sources: CYREN 2 and Spam- 
Cop. 3 

These spamming record data fields acted as a driving force in the development of our ex¬ 
perimental spamming testbed. Through observation and analysis of pOf records, we deter- 

DNS Queries: Reverse DNS Lookup and PTR Query [http://www.dnsqueries.com/en/reverse_lookup.php/] 
2 CYREN Security Services: Embedded IP Reputation [http://www.cyren.com/] 

3 SpamCom Reporting Service [http://www.spamcop.net/] 
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mined the most frequently used OSs by spammers in order to use same OS in our test. com 
experiment. Finally, we used each source IP address and its measured example. com spam¬ 
ming timestamp to run during each instance of our BGP correlation algorithm, which will 
be further explored in Section 3.3. 

3.2 Laboratory Spamming Testbed 

To properly establish the context of test. com, we must review in detail its configura¬ 
tion, which mirrors the configuration used in example. com. This section will explain how 
test. com was configured, exchanged email, and conducted measurements to those simi¬ 
larly seen in example. com. 

3.2.1 Experimental Design 

A domain is configured to exchange email through the use of MX records in the DNS, 
with each MX record having an assigned target mail server with a designated name (e.g., 
TMSl.test.com) and numeric preference value (e.g., 10). Using SMTP, the delivering 
MTA attempts to send mail to the servers with the lowest (most preferred) preference value. 
The overall purpose of the entire experiment was twofold: to determine MTA spamming 
behavior given both IPv4 and IPv6 as available transport protocols and to understand how 
large a role the spamming MTA’s OS plays in spamming behavior. We therefore needed 
to ensure that network protocol selection and preference was followed. We therefore relied 
on SMTP to operate as designed [23], [27]—[29] and on the MX preference of the MTA 
names and their corresponding IPs [23], [27]. To link a mail server’s hostname to an IP 
address, DNS name resolution resolves the MX record to the A record (A) or quad-A 
record (AAAA) record delivered by the IPv4 or IPv6 protocol. The specific protocol used 
is determined by the existence of an A or AAAA in the DNS, the network availability of the 
sending server, the sending server’s local policy, and the OS used by the sending MTA. 

Our testbed DNS contained two MX records that had targets corresponding to the primary 
(TMS1. test. com) and secondary (TMS2. test. com) MTAs, both of which were configured 
as IPv4 MTAs only (i.e., the primary and secondary MTAs had no IPv6 addresses). The 
primary and secondary MTA each had an A record and rejected incoming mail requests from 
all IP addresses. The rejected mail message consisted of a SMTP 421 error code ("service 
not available") that would follow after the SMTP HELO command from the sending MTA. 
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This rejection was achieved through a script that ran on TMS1 and TMS2 that would attain 
that same result if a sending IP to example. com correlated to an IP address contained 
within the DNSBLs or reputation policy. 

Note that this default rejection behavior is different than used in the production example. com 
setup; instead we wish to redirect all traffic in the testbed for the purposes of understanding 
individual MTA behavior. 


Name 

Type 

Pref. 

Target 

test.com. 

MX 

10 

TMS1.test.com. 

test.com. 

MX 

20 

TMS2.test.com. 

test.com. 

MX 

30 

TMS3.test.com. 

TMS1.test.com. 

A 


192.0.2.1 

TMS2.test.com. 

A 


192.0.2.2 

TMS3.test.com. 

A 


192.0.2.3 

TMS3.test.com. 

AAAA 


2001:db8: :3 


Table 3.1: DNS Resource Records Corresponding to test. com. 


We added a third MX, having IPv4 and IPv6 connectivity, a higher MX preference value, 
and pointing to TMS3. test. com, a dual-stack server with both A and AAAA records. The 
testbed’s DNS configuration is shown in Table 3.1. We refer to our third MX server as TMS3. 
TMS3 always rejects incoming connection attempts to port 25 over IPv4 and IPv6 by issuing 
TCP RST packets in response to the TCP SYNs. 

Thus, to simulate the rejection behavior of DNSBLs and to minimize the misuse of net¬ 
work resources, we issued SMTP 421 error codes, which exist in the application layer, in 
response to connections at TMS1 and TMS2. For rejections at TMS3, we issue the TCP RST at 
the transport layer, ensuring that any connect attempt is actively rejected and understood 
by the sending MTA. For example, given our experimental setup demonstrated in Table 3.1 
and Figure 3.2, a spammer attempting to send mail to anyone at test. com would first try 
TMS1, then TMS2, then TMS3. 

The purpose of the TCP RST for each spam attempt at TMS3 was to understand default 
network behavior of commonly used OSs and MTAs as a baseline for comparison with 
observed behavior on the production domain. 


17 




Figure 3.2: test. corn’s Operational Behavior 


3.2.2 Testing MTAs 

We configured three different dual-stack “spammers” utilizing well known MTAs: Mi¬ 
crosoft Exchange [30], Sendmail [31], and Postfix [32], as shown in Table 3.2. In each 
iteration of our spam testbed experiment, we recorded each MTA message delivery attempt 
to an email address within our test domain for a measurement period of 36 hours. Of note, 
each machine within test. com used a global IPv6 address, suggesting that the sending OS 
would prefer the IPv6 address over the IPv4 address [29]. 

Name Operating System MTA Software 

spaml.private.com MS SBS 2011 ME 2010 14.01 

spam2.private.com Ubuntu 12.04 Postfix 2.9.6 

spam3.private.com Ubuntu 12.04 Sendmail 8.14.4 

Table 3.2: Test Domain Configuration 

For measurement purposes, we differentiated application-level TCP-level connection at¬ 
tempts by each spammer as seen at TMS3. We defined a maximum time between SYN pack¬ 
ets as 20 seconds, and grouped SYN packets with the same TCP source port and source IP 
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address. We refer to the resulting group as a connect attempt, referring to the socket call 
by the same name to produce that behavior. Our goals in conducting this experiment were 
three-fold. First, we wanted to characterize and explore each MTA’s attempts to establish 
connections over IPv4 and IPv6 at TMS3, over time. Second, we wanted a set of control 
data to which we could compare our measurements observed on the production domain. 
Third, we wanted to understand how each sending MTA chose port numbers while making 
repeated attempts to send mail over time. 

Some of the specific questions we sought to answer with this testbed are the following: 

• For each connection retry by the MTA, does the connection try both IPv4 and IPv6, 
and which does it try first? 

• How much time elapses between IPv4 and IPv6 connection attempts associated with 
each spammer “retry?” 

• How many SYNs are sent for each observed connect attempt, and how much time 
elapses between each connect attempt? 

• How many distinct connect attempts are observed over an extended period of time? 

• What is the port usage behavior observed with associated IPv4 and IPv6 connect 
attempts? 

Of particular interest in our experiment was the behavior of port selection and how OS 
and MTA choice affected the port numbers used. According to IANA, port numbers that 
range from 49152-65535 are titled “dynamic” or “ephemeral” ports. Port numbers in this 
range are set aside for dynamic use and are predesignated as being unable to be assigned by 
IANA as system or user ports. Proper ephemeral port usage policy calls for an application 
to select any port available within the range, so long as the application does not assume 
or identify a specific port number to be used consistently [33]. Ephemeral port number 
selection play an important role in our experiment if we are to successfully identify sending 
MTA characteristics. Determining whether an OS or MTA uses random, sequential, or 
cycled ephemeral ports for each connect attempt, as well as what base port offset is used, 
was a significant part of mapping observed behaviors to OSs. Using these feature, we 
seek to establish a measurable baseline of commonly used mail applications and OSs. The 
results of our test. com experimentation, found in Table 4.3, will be further discussed and 
analyzed in Chapter 4. 
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3.3 BGP Correlation Algorithm 

As noted by Rachachandran et al., some sophisticated spammers are exploiting the weak¬ 
nesses of the Internet routing infrastructure via short-lived BGP route updates from hi¬ 
jacked prefixes [20]. While their work identified BGP spectrum agility as a unique and 
measurable spamming behavior, their work was focused only on IPv4. We sought to expand 
their efforts to IPv6 to see if, in fact, IPv6 spammers were using the same BGP spectrum 
agility behavior to complement their spamming efforts. As previously mentioned, we relied 
on Route Views to provide all BGP updates for comparison to our example. com spamming 
measurements. To fully comprehend BGP spectrum agility, we must first understand what 
it means to “hijack” a BGP prefix. 


Before/After Hijack 




During Hijack 


Figure 3.3: Example BGP Route Hijacking 


When BGP is used to route traffic between AS, it relies upon advertised AS routes to send 
traffic from source to destination AS. The design of BGP allows each AS to implicitly 
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trust a peered AS, which allows a malicious person to masquerade as a peered AS and 
advertise a new route. This spoofing of a newly advertised BGP route is known as BGP 
hijacking 3.3. The hijacker then illegally selects a block of IP address space that does 
not belong to them and advertises the IP address block as a spoofed AS route. Once the 
attacker advertises this spoofed route, the attacker appears to come from an IP address that 
does not belong to them and their traffic appears to be routed from the spoofed, remote AS. 
When the attack is complete the attacker sends an AS withdrawal message to restore the 
AS path advertised prior to attacker interference. Malicious persons using this technique 
often employ unused network addresses, known as “dark net” addresses, to reduce the 
likelihood of the IP address ending up on a DNSBL. Figure 3.3 depicts an example of a 
prefix hijacking and is a proven method that spammers use to obfuscate themselves in order 
to avoid spam filtration mechanisms [20]. 


3.3.1 bgpdump 

Route Views stores their BGP update messages in a RIB file in the popular MRT format, 
which is a binary formatted file. In order to read each RIB, we relied on a well known BGP 
tool called bgpdump. 4 bgpdump is able to convert each RIB file to human-readable, ASCII 
text then parse each line of the file in order to allow the user to query the desired BGP data. 
For the purposes of our algorithm, bgpdump successfully parsed the timestamp, message 
type, source IPv6 address network prefix, and the origin AS of each BGP Announce or 
Withdrawal message. 


3.3.2 Algorithmic Description 

Once we have the parsed data from bgpdump, we use each instance of recorded spam from 
our example. com experiment and search all BGP updates from the time of one instance of 
spam to the next instance of spam time. Our algorithm identifies any IPv6 network prefix 
BGP announcements or withdrawal messages that have an exact match to the recorded IPv6 
spamming IP address and records some other statistical data. 

4 BGPDump Repository [https://bitbucket.org/ripencc/bgpdump/wiki/Home/] 
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tree — new (Radix trie ) 
spamtime .previous = 0 
for each spam in database do 

spamtime.next = timestamp.spam 
re pi ayBGPupdates(s pamtime.previous, spamtime.next) 
addspam(spam, tree) 
end for 

Figure 3.4: BGP Correlation Algorithm Pseudocode 


The pseudocode seen in Figure 3.4 may be straightforward, but the elements needed for 
algorithm execution were sufficiently complex and will be further explored, along with the 
results, in Chapter 4. 
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CHAPTER 4: 
Experimental Results 


In the previous chapters we introduced our experimental design, which involved measure¬ 
ments from a real-world production domain, a simulated spamming domain, and a BGP 
correlation algorithm. In order to bring this large corpus of information together this chap¬ 
ter will focus on the results and analysis from each experiment. Our dataset suggests that 
while there is a slowly increasing amount of IPv6 activity on the Internet, it is still orders 
of magnitude less than its IPv4 counterpart. Additionally, spammers will continue to use 
proven IPv4 exploitation methods, such as BGP spectrum agility, when sending spam in 
IPv6. 


4.1 Notable example. com Analysis 

One of the aims of the example. com experiment was to associate an IPv4 address with an 
IPv6 address from dual-stacked spammers. A mechanism to accurately identify a spammer 
based solely on sent spam traffic has yet to be developed, but a few forensic details from 
the spam traffic can allow us to correlate an IPv6 address to the likely IPv4 address for 
spam attribution. The time between each IPv4 and IPv6 connect attempt, embedded IPv4 
host addresses in IPv6 addresses, ASN use, DNS PTR records, TCP source port proximity, 
IP address proximity, and overall connect behavior were all analyzed in the example. com 
experiment. Once each of those connection statistics were recorded, they were scored based 
on a naive matching algorithm that utilized a highest confidence score to associate an IPv4 
and IPv6 address. The matching algorithm was developed outside of the thesis as part of 
a collaborative work effort and is subject of a separate publication under review. However, 
utilizing the test. com experimental results, we verify the matching algorithm through the 
understanding of actual ground-truth behavior in our laboratory testbed. 

While there were a myriad of statistics developed as a result of example, com measure¬ 
ments, as seen in Table 4.1, there were some key points. First, 80% of the IPv6 addresses 
observed were associated with IPv4 addresses, which made up 87% in total of all observed 
IPv6 addressees within the experiment. Second, all but 1% of associated IP addresses 
used the same ASN, which can help in discovering spammers that use BGP route prefix 


23 




hijacking. Third, the Windows hosts sending spam tended to select source ephemeral port 
numbers within a small range of only 500 different ports from the initial source port num¬ 
ber for each connect attempt. This behavior was in direct contrast to Linux and other OS 
MTAs, and when combined with the results of the tool pOf, created a unique profile of a 
spamming Windows host. 


Statistic 

Count (%) 

Unique IPv6 addresses 

3670 

- IPv6 assoc. 

2907 (79) 

- IPv6 assoc. - first rogue 

575 (15.7) 

- IPv6 assoc. - single IPv4 

2722 (74.2) 

- IPv6 unassoc. - 6to4 w/ embedded IPv4 

42(1.1) 

IPv6 connect attempts 

135770 

- With IPv4 association 

118177(87) 

- Close proximity assoc. 

85640 (63) 

- Rogue w/ previous close proximity 

32537 (24) 

IPv6 Associations 

3319 

- Embedded IPv4 host 

269 (8) 

- Embedded IPv4 host (6to4) 

215 (6.5) 

- ASN 

3281 (99) 

- PTR 

309 (9.3) 

-OS 

1823 (55) 

- OS mismatch 

40(1.2) 


Table 4.1: Summary of Results from IPv6/IPv4 Address Association. 


Once IPv4 and IPv6 address association was accounted for, a holistic analysis was con¬ 
ducted on IPv6 spamming attempts over the course of the entire experiment. As seen in 
Figure 4.1, there were roughly two orders of magnitude more attempts in IPv4 than IPv6 
with an average of 2908 IPv6 connect attempts per week. Of the IPv6 spammers, 20% 
attempted to connect to the sensor MTA only once, 70% up to 10 times, and 5% more 
than 100 times. Another interesting trend was also observed, see Table 4.2, in the OSs of 
spamming hosts using IPv4 and IPv6. The various results from this experiment led to the 
following conclusions: that IPv4 spamming activity is still about 100 times more pervasive 
than IPv6 spamming activity, that the experiment was unsuccessful in determining the ex¬ 
act profile of any IPv6 spammer measured in the experiment, and the amount of IPv6 traffic 
did not increase over the course of the experiment. 
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Figure 4.1: Weekly connect Attempts to example. corn’s Sensor MTA. 


os 

IPv4 Hosts 

IPv4 Attempts 

IPv6 1 
Assoc. 

losts 

Unassoc. 

IPv6 A 
Assoc. 

ttempts 

Unassoc. 

Windows 

593150 (62.41) 

9109137(49.77) 

293 (8.40) 

105 (13.76) 

18492(13.43) 

2652 (15.07) 

- Windows NT 

46027 (4.84) 

1928541 (10.54) 

13 (0.37) 

4 (0.5) 

77 (0.06) 

28(0.16) 

- Windows XP 

285454 (30.04) 

4355433 (23.80) 

2 (0.06) 

2 (0.3) 

12(0.01) 

12(0.1) 

- Windows 7 

261669 (27.53) 

2825163 (15.44) 

278 (7.97) 

99(12.8) 

18403 (13.37) 

2612(14.8) 

Linux 

66876 (7.04) 

7025690 (38.38) 

1900 (54.46) 

317 (41.55) 

96976 (70.43) 

11842(67.31) 

- Linux 2.0 

7 (0.00) 

21788 (0.12) 

- 

- 

- 

- 

- Linux 2.2 

7353 (0.77) 

339103 (1.85) 

52(1.49) 

26 (3.4) 

1112(0.81) 

271 (1.5) 

- Linux 2.4 

7367 (0.78) 

723900 (3.95) 

163 (4.67) 

21 (2.8) 

3580 (2.60) 

273 (1.6) 

- Linux 2.6 

29431 (3.10) 

4464112(24.39) 

492(14.10) 

115 (15.1) 

30605 (22.23) 

6364 (36.2) 

- Linux 3.x 

22718 (2.39) 

1476787 (8.07) 

1193 (34.19) 

155 (20.3) 

61679 (44.79) 

4934 (28.0) 

Solaris 

198 (0.02) 

24781 (0.14) 

- 

- 

- 

- 

Mac OS X 

999 (0.11) 

128024 (0.70) 

- 

- 

- 

- 

FreeBSD 

2066 (0.22) 

261686(1.43) 

- 

- 

- 

- 

OpenBSD 

20 (0.00) 

416(0.00) 

- 

- 

- 

- 

Other 

145 (0.02) 

19232(0.11) 

7 (0.20) 

3 (0.4) 

64 (0.05) 

9(0.1) 

Unknown 

286902 (30.19) 

1734496 (9.48) 

1289 (36.94) 

338 (44.3) 

22163 (16.10) 

3090 (17.6) 


Table 4.2: example. corn’s Inferred Operating Systems for Spamming Hosts 


4.2 MTAs Behaving Badly 

While the experimentation and analysis conducted with example, com was extensive, we 
felt that it would be advantageous to understand the default network behaviors of com- 


25 















































monly used OSs and MTAs as a baseline for comparison with the observed behavior in 
example. com. As previously mentioned in Section 3.2, we sought to explore how a MTA 
preferred each IP version, port numbers, and to identify default behaviors. To determine 
a common behavior for each MTA and mimic the sending of spam, we observed how 
each MTA would attempt to send a single message over a period of 48 hours to a sensor 
MTA. Our sensor MTA would continually send a TCP RST for each attempt by the send¬ 
ing MTA to deliver the “spam” message, which mirrored the behavior of example. com’s 
sensor MTA 3.1. Once we parsed and analyzed the pcaps from our experiment, we gen¬ 
erated Table 4.3. These results were instrumental in identifying anomalous behavior from 
example.com. 


Software 

Sendmail 

Postfix 

MXS 2010 (IPv6) 

MXS 2010 (6to4) 

IPv6 or IPv4 first? 

IPv6 (100%) 

IPv4 (62%) IPv6 (38%) 

IPv4 (100%) 

IPv4 (100%) 

Time per connect attempt? 

10 min 

83 min, 70 min thereafter 

10 min 

10 min 

SYNs per connect attempt? 

i 

i 

3 

3 to 6 

Port selection? 

Ephemeral 

Ephemeral 

Ephemeral 

Ephemeral 

Port range (IPv4)? 

49224-50091 

42502-57419 

6331-65741 

7166-65079 

Port range (IPv6)? 

57626-58493 

35469-35603 

6334-65473 

7170-63309 


Table 4.3: Results from test. com Spamming Experiment 


While we sought to correlate results from test. com to the observed behaviors of 
example. com, it is important to note that behaviors can be the result of either the ap¬ 
plication or the OS. For example, the number of SYNs per connect attempt is believed to 
be an OS specific behavior. Furthermore, there could be conditions within each OS that 
make behavioral association of the example. com dataset complicated. We were unable to 
make an attributable claim when comparing example . com spam behavior because we have 
no way of knowing whether the spamming behavior was the result of native OS procedures, 
an OS specific MTA, or an OS infected with spamming malware. 

4.2.1 Sendmail v8.14.4 

When comparing all three MTAs, Sendmail exhibited the most expected network pref¬ 
erence characteristic of an MTA in that it always preferred IPv6 over IPv4. With each 
connect attempt observed, we detected that IPv6 was always tried prior to IPv4. We ob¬ 
served only one connect attempt per selected IP. The time difference observed between 
the IPv6 connect attempt and the IPv4 connect attempt that followed was less than half a 
second and should be considered inconsequential. Each SMTP connect attempt consisted 
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of only one IPv4 and one IPv6 connection and the first retry attempt occurred 3 minutes 
49 seconds after the first SMTP connect attempt. Following the first SMTP connection 
retry, Sendmail attempted to connect every 9:55 seconds. Sendmail chose an ephemeral 
port for the first SMTP connect attempt, one for IPv4 and one for IPv6. Each subse¬ 
quent connect attempt saw an increase in the port number by some value between three 
and seven, but throughout the entire transaction, it can be observed that the IPv4 and IPv6 
port numbers stayed in range of their initial ephemeral port number. While this port selec¬ 
tion behavior will change with each new email, we observed that Sendmail incrementally 
increased the port number with each email retry, beginning with an arbitrary source port. 

A particular observation from the example. com dataset regarding the behavior of the Send¬ 
mail MTA, was the fact that Sendmail would periodically retry to send mail from the same 
source IP address and port minutes, hours, and days after initial connect attempts. This be¬ 
havior was a particular incident observed in the example. com dataset that we did not repro¬ 
duce in test. com, so we are unsure as to how widespread the behavior is within real world 
production domains. After further investigation and data correlation in the example. com 
dataset, we were able to determine that this anomalous behavior of same source IP and port 
reuse was conducted by a single organization. We were able to confirm this relationship by 
directly contacting the organization that originally sent the mail to determine which MTA 
was currently being employed within their organization. We then looking for this pattern of 
behavior in our test. com dataset and we unable to find any instance of same source IP and 
port re-use. Unfortunately, we were unable to account for this behavior in every instance 
of connect attempts over extended time periods observed in the example. com dataset and 
we are unsure of the Sendmail configuration that caused this unique behavior. 

4.2.2 Postfix v2.9.6 

By observing our Postfix measurements within our MTA testbed, we can say that Postfix 
comparatively performed in the most random way. Postfix tried both IPv4 and IPv6 with 
one SYN during each connect attempt and attempted IPv4 half a second before attempting 
IPv6. The time period that elapsed between each SMTP connect attempt was the most 
irregular of the three MTAs. The next SMTP connect attempt following the first was at 
7:50 seconds, followed by a 10 minute, 20, and 40 minute retry attempt. Following the 
40 minute retry interval, Postfix reattempted an SMTP connect every 10 minutes there- 
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after. Postfix exhibited the same port behavior as Sendmail by choosing an ephemeral port 
to connect to for IPv4 and IPv6, but the two MTAs did not choose the same ephemeral 
ports. Following the first SMTP connect attempt’s port number, each reattempt saw the 
port number increased by some value between three and seven due to other source port 
allocations created in the interim. 

The Postfix results regarding IP preference selection was particularly interesting. Accord¬ 
ing to Postfix documentation, IPv6 is preferred over IPv4 when both protocols are available 
and, in versions 2.8 and later, allow the administrator to specify which protocol should be 
preferred [32]. When both protocols are available, IPv6 should be used first by default 
but a disclaimer in the documentation informs readers that default settings are “unsafe.” 
If mail delivery is attempted when there is an IPv6 outage on Postfix, the message could 
fail to deliver even if a route still exists using IPv4 5 . Also, if the protocol preference is set 
to “any,” there will be an equal preference for both IPv4 and IPv6, making the selection 
non-deterministic. Finally, we observed the same random protocol preference selection in 
both example. com and test. com, which assisted and validated our IP address association 
from example. com. 

4.2.3 Microsoft Exchange 2010 vl4.01 

Microsoft Exchange Server 2010 (MXS) exhibited the most identifiable MTA behavior of 
all our measurements. Two different types of experiments were conducted with MXS, the 
first using native IPv6 connectivity and the second using the 6to4 tunneling protocol. MXS 
makes connect attempts on both IPv4 and IPv6 for the initial and subsequent connections, 
with IPv4 tried first. For each connect attempt, MXS sends three SYNs for both IPv4 and 
IPv6 and each connect attempt are within a half of a second of each other. After the initial 
SMTP connect, another was launched 14 minutes later. Following the 14 minute period, 
a connect was launched every 10 minutes thereafter. During each IPv4 and IPv6 connect 
attempt, an ephemeral port was chosen by the OS for the IPv4 connect attempt and then 
the port value increased by one for the IPv6 connect attempt. This ephemeral port and 
its plus one counterpart was observed at each SMTP connect attempt. Overall, the triple 
SYNs over each connect attempt with IPv4 and IPv6 and the adjacent port numbers for each 
connect attempt are indicative of MXS behavior and could be a signature of a spammer 

5 Postfix Configuration Parameters [http://www.postfix.Org/postconf.5.html] 
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utilizing MXS. 

When evaluating each MXS spam attempt form test. com, there was very little discernible 
difference between the native IPv6 and the 6to4 results. Both methods, native IPv6 and 
6to4, tried IPv4 prior to IPv6 for each connect attempt and utilized the same time retry 
period. Also, the same ephemeral port selection behavior was present in that the port 
values increased during each connect attempt and each subsequent connect selected an 
random port number. The only observable difference that can be seen was the fact that 
6to4 employed between three and six SYNs for each connect attempt while native IPv6 
connectivity uses only three SYNs for each connect attempt. Finally, we must mention 
the lack of well-behaving protocol preference conformity observed by MXS [29]. Prior to 
each experiment, we ensured that both the Windows Server running MXS and MXS itself 
preferred IPv6 over IPv4 through the use of the ‘netsh’ command line tool 6 . Regardless of 
the prefix policy settings, every connect attempt utilized IPv4 first. This behavior suggests 
that MXS is a badly behaving MTA that ignores protocol preference established by the OS 
at the application layer. 

Finally, we analyzed the trends from test, com dataset and applied what we learned to 
the example.com dataset. Unfortunately, correlation between IPv4 and IPv6 addresses 
from example. com is a difficult and ongoing discovery process, so we were unable to 
quantify the use of IPv4 or IPv6 first as a characteristic as in Table 4.3. We were able 
to successfully determine a few other unique behaviors that allow us further insight into 
profiling our observed example . com spam behavior. Whether or not a connect attempt 
used a single SYN or a sequence of three SYNs, the IP used for each attempt, and the port 
numbers were all analyzed. 

Of note, 14% of the example. com spam data included three or more SYNs per connect, 
which parallels with our observed test. com spamming behavior of MXS. While this is 
certainly a highly correlative behavior, it is unknown whether that is because the spam 
senders are actually MXS MTAs, other MTA implementations on Windows machines, or 
Windows malware. Of note, all port numbers from example, com were ephemeral, which 
also parallels with our spamming testbed dataset. While this additional analysis of the 

6 To view prefix policy: ‘netsh interface ipv6 show prefixpolicies’ 

To change prefix policy: ‘netsh interface ipv6 add prefixpolicy address precedence-value label’ 
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Total Packets 

109138752 

SYN Count 

28898011 (IPv4) 

1315838 (IPv6) 

connect attempts 

20693161 (<3 SYNs/attempt) 
3478195 (>3 SYNs/attempt) 

Port Range 

1024-65519 


Table 4.4: Statistics from example. com Dataset 


example. com dataset displays strong related behaviors with the test. com measurements, 
it is by no means a definitive answer regarding default spamming behavior and is more of 
a demonstration of ground truth validated by a realistic production dataset. 

4.3 Tracking IPv6 Spam via BGP 


Address Type 

Type Count 

2001:: 

129 

2002:: 

2 

2400:: 

9 

2600:: 

88 

2a00:: 

237 

Network Prefix 

Count 

/16 

2 

/21 

1 

/26 

56 

129 

6 

132 

291 

/36 

3 

/40 

4 

/44 

2 

/48 

100 


Table 4.5: BGP Algorithm Statistics from BGP Episodes Under 25 Hours 

In Section 3.3, we introduced our BGP Correlation Algorithm that we developed in order 
to determine if spammers were using BGP spectrum agility in IPv6 [20]. Our intent was 
to develop an algorithm that replayed BGP update messages before, during, and after each 
recorded instance of spam from example. com. A detailed version of our algorithm can be 
studied in the Appendix. Using the IPv6 network prefix within a BGP route announcement 
or withdrawal and an example . com IPv6 spam address, we wanted to observe instances of 
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a BGP announcement containing a large IP address space, a recorded IPv6 spam attempt 
from example. com with a timestamp closely following the BGP route announcement, and 
then a BGP withdrawal of the route following the last recorded IPv6 spam attempt. We will 
hereafter refer to this sequence of events as a BGP episode. The discovery of the BGP spec¬ 
trum agility technique was initially made in IPv4 spam and although there were only a few 
spamming entities that seemed to use this technique, there were some very unique, consis¬ 
tent, observable behaviors associated with BGP spectrum agility [20]. We observed some 
instances of behavior in IPv6 BGP messages that mimics that of BGP spectrum agility. 



Figure 4.2: Length of Short-Lived BGP Episodes Lasting Under 25 Hours 


We observed some 465 instances of BGP announce and withdrawal updates, with a life¬ 
time of less than 25 hrs, that correlated with recorded IPv6 spam attempts, such as the 
BGP episode displayed in Figure 4.3. The spam attempts listed in Figure 4.3, originating 
from 2a01:111: : /32 , displayed 11 instances of BGP agility behavior lasting less than 25 
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Y=BGP Announcement 
Z=BGP Withdrawal 
X=Spam Attempt 


Timestamp 


Figure 4.3: Observation of 2a01: 111: : /32 BGP Agility Behavior 
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Figure 4.4: 2a01: 111: :/32 BGP Agility Episodes Lasting Less Than 25 Hours 


hours. Most of the BGP episodes from 2a01: 111: : /32 occurred in October 2013 and on 
October 11, 2013, there were three BGP episodes, which is highest number of instances 
of BGP episodes recorded by 2a01:111: : /32. It is important to note the 25 hour lifetime 
of the observed BGP agility behavior, as seen in Ligure 4.2, as this was the threshold that 
we set for our area of study. Anything larger than a 25 hour lifetime starts to introduce 
more likelihood that we inappropriately associate a BGP updates message as BGP spec- 
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trum agility behavior. We believe that a misconfigured border router or a poorly managed 
ISP could be falsely flagged as a spamming entity in our dataset without any limitation 
on the lifetime threshold to determine BGP spectrum agility. In evaluating Figure 4.2, we 
can see that nearly 60% of all the recorded spam associated with BGP agility behavior 
lasted less than 20830 seconds or 5 hours and 47 minutes. As seen in Table 4.5, the IPv6 
address types include 2001, 2002, 2400, 2600, and 2a00 and the prefixes include /16, 
/21, /26, /29, /32, /36, /40, /44, and /48. These IPv6 prefixes are not representative of 
observations made by Ramachandran et al. because Ramachandran et al. observed IPv4 
addresses, which came from IPv4 address space and is roughly 17 billion times smaller 
than IPv6 address space. For example, while ISPs in IPv4 are typically expected to have 
a /16 network, ISPs in IPv6 might be expected to have a /32 network. The differences in 
the network sizes observed in our BGP spectrum agility data are not uncommon due to the 
large, less-trafficked IPv6 address space [9] and the wide availability of unallocated IPv6 
networks. 

While a majority of the IPv6 prefixes list in Table 4.5 can be observed in the first 60% of 
all the recorded spam associated with BGP agility behavior, most of the IPv6 prefixes are 
of a /32 network. Ramachandran et al. originally observed and classified BGP spectrum 
agility in /8 IPv4 networks, but no previous metric existed that suggested what we might 
expect to observe in our experiment regarding IPv6 network sizes. Using Table 4.5, we can 
strongly suggest that a /32 would seem to be the most common type of prefix expected 
when observing BGP spectrum agility behavior in IPv6. Ramachandran et al. also found 
that the observed spamming IPv4 addresses were widely distributed across an advertised 
address space and that most the IPv4 addresses observed appeared only once. These dis¬ 
covered BGP agility characteristics by Ramachandran et al. are the antithesis of what we 
observed in the characteristics of our spammed IPv6 addresses. Most of the spam attempts 
within each prefix were sent from a small set of IPv6 addresses. This was surprising, as 
we expected to see a more significant exploitation of the large IPv6 address space by re¬ 
peatedly changing source IP address. Moreover, multiple episodes of spamming attempts 
to the same destination IPv6 address can be observed over an extended time period. 

Additionally, every IPv6 address type in our dataset has been allocated to a RIR and we did 
not observe the use of any “reserved” IPv6 prefixes. Finally, as demonstrated in Figure 4.5, 
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Figure 4.5: Observation of 2607: 9000: : /32 BGP Agility Behavior 


we can see an instance in which a spammer attempts multiple spamming episodes using 
BGP spectrum agility in a short period of time. 

While each spamming episode observed in our dataset is indicative of BGP spectrum 
agility, the episodes are not all uniform in their behavior. We viewed spamming episodes 
that attempted different IPv6 addresses within the prefix, some that only targeted one or 
two IPv6 addresses, and some that targeted IPv6 addresses that appear questionable in na¬ 
ture due to an address that uses “leetspeak” formatting. While the observed behaviors are 
certainly interesting and warrant study, the results are inconclusive as to whether or not all 
of our observed instances are IPv6 spammers intent on evading filtration mechanisms. We 
fully expected to catch IPv6 spammers that took advantage of the almost unlimited IPv6 
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address space by using different IPv6 addresses for every spam attempt within an episode, 
but we observed the opposite behavior; what appeared to be multiple instances of a target 
set of one to four IPv6 spamming addresses in an episode that differed by prefix. It ap¬ 
pears that spammers are attempting to use the same methods that allowed them to spam in 
IPv4 in IPv6. In order to fully understand and classify this behavior a more comprehensive 
study needs to be conducted over an extended period of time to determine a more definitive 
answer to the IPv6 BGP spectrum agility question. 
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CHAPTER 5: 
Conclusion 


This thesis sought to investigate abusive IPv6 traffic by analyzing suspicious activity at the 
MTA of a production email domain and associating it with BGP routing updates. Using two 
experimental designs, one from a live production domain (referred to as example . com) and 
one from a laboratory (referred to as test. com), we were able to collect a corpus of IPv6 
spammers and their behaviors. Additionally, we developed an IPv6 spam BGP correlation 
algorithm that utilized collected spam attempts to identify BGP spectrum agility behavior 
previously discovered as a method for malicious users to send abusive IPv4 traffic. We 
performed statistical analysis on our example. com dataset to recognize spamming profiles 
in both IPv4 and IPv6, to attempt to associate IPv4 and IPv6 spamming addresses, and to 
better understand what behaviors an IPv6 spammer might present when attempting to send 
malicious traffic over an extended time period. We provided some insight into the security 
studies of IPv6 as it emerges as a more dominant IP within the Internet. 

We found that although IPv6 was not nearly as prevalent as we expected, we determined 
that no one type of OS, IPv6 address space, or network origin dominated our IPv6 obser¬ 
vations. We also observed that there does not seem to be any preference over which type of 
IPv6 access methodology is used, be it native or tunneling technologies. We also found that 
IPv6 abusive traffic was minimal within our dataset and therefore so were the exploits of the 
large IPv6 address space. In contrast, our results of the BGP correlation algorithm proved 
insightful. We were able to discover many short-lived BGP agility behaviors that certainly 
warrant further investigation and classification. Our thesis efforts will bolster the study 
of IPv6 abusive traffic measurement and it is our hope that our work will provide a solid 
foundation for future study and mitigation efforts against abusive IPv6 Internet practices. 

5.1 Future Work 

To the best of our knowledge, this thesis work is the first time that spam measurement in 
IPv6 has been conducted in a live production domain from the point-of-view of the MTA. 
Additionally, this is the first publicly available study that seeks to correlate abusive IPv6 
traffic with nefarious BGP behaviors. While our methods for measurement and study were 
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unique in nature and necessary as IPv6 takes a more active role in the Internet, there are 
some improvements that could be made to better characterize abusive IPv6 traffic: 

Collecting the content of the spam messages from example. com. One of the shortcom¬ 
ings from the example. com measurement is the IPv4 and IPv6 spamming address associa¬ 
tion and any false positives or negatives that may have been contained within measurement 
constraints. One way to correctly classify whether or not traffic is spam is to collect and 
analyze message content. If this were to be done, we could not only ensure that legitimate 
mail is not being contained, but a matching algorithm could be used to better associate IPv4 
and IPv6 spamming addresses assuming that a spammer would send the same spam content 
in IPv4 and IPv6. Finally, analyzing message content would allow us to better refine our 
filtration methods, which in turn would improve the accuracy of our data. 

Further investigation into IPv6 spamming ASNs. A decent amount of follow up work 
could be conducted in the association of IPv6 spamming ASNs. Using the data from our 
BGP correlation algorithm, we could investigate trends over time to determine if there are 
certain dominant IPv6 spamming ASNs. Further investigation could determine if IPv6 
spamming ASNs are consistent with IPv4 spamming ASNs and to determine if the rollout 
of IPv6 availability has any relation to IPv6 spamming entities. 

Greater period of data collection and measurement analysis. Unfortunately, there was 
a limited amount of spam observed from example. com and the trends observed from the 
BGP correlation algorithm were not quite as dynamic as expected. As a result, more of 
these measurements need to be conducted over longer periods of time. As IPv6 expands, 
more data could be collected, allowing us a better chance of discovery of IPv6 spamming 
behaviors. Prevalence in certain areas of the Internet, measured growth over time, and 
new IPv6 spamming methods are all areas of measurement that will assist in identifying, 
characterizing, and monitoring spam in IPv6. 
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Appendix 


#!/usr/bin/env python 

# Program: $Id: playerv3.py 258 2014—08—11 20:16:39Z rbeverly $ 

# Authors: Robert Beverly <rbeverly @ nps . edu> / Mark Turner <mjturner@ nps . edu> 

# Purpose: Determine lifetime of the IPv6 BGP prefixes announcing 

# observed IPv6 spam 

# Pseudo—code: 

# 

# tree - new( radix trie) 

# last_spam_time = 0 

# for spam in sorted (spams ): 

# current_spam_time = time (spam) 

# playAllBGP (last_spam_time , current_spam_time ) 

# addspam (spam , tree ) 

import sys 
import os 
import subprocess 
import radix 
import csv 
import time 
import datetime 

class BGPUpdates : 

def_init_(self, dumpdir , verbose = False ): 

self . bgpdump=" / home / libbgpdump —1.4.99.1 3/bgpdump" 
self . dumpdir=dumpdir 
self.verbose=verbose 
self.rtree = radix . Radix () 

self.files = [ f for f in os . 1 i s t d i r ( s e 1 f . dumpdir) if f . find (’ updates .’) != —1] 

self . files . sort () 

self . last_update_time = 0 

self . current_file = None 

self.proc = None 

self . grabNextFile () 

se 1 f . spamdict = diet () 

def processUpdates ( self , end): 

print "^Processing^BGP^Updates^from : " , self . last_update_time , "to:", end 

(announce, withdraw) = (0,0) 

while ( self . last_update_time < end): 

(tstamp, msgtype , prefix, origin) = s e 1 f . grabOneUpdate () 
if self . verbose : print "tstamp:", tstamp, "msgtype:", msgtype, 

"prefix:", prefix, "origin:", origin 
s e If . updateTrie (tstamp , msgtype, prefix, origin) 
if msgtype == ’A’: announce+=l 


39 





elif msgtype -= ’W’ : withdraw += 1 
self . last_update_time = tstamp 
return (announce, withdraw) 

def updateTrie (self , tstamp, msgtype, prefix, origin): 
if msgtype == ’A’ : 

if not self . rtree . search_exact ( prefix ): 
mode = self . rtree . add( prefix ) 
mode . data [" spams " ] = [] 
mode . data [" announced" ] = tstamp 
mode . data [" origin " ] = origin 
#else : 

# print "Announcement for existing prefix ", prefix 
elif msgtype == ’W’ : 

mode = self . rtree . search_exact ( prefix ) 
if mode : 

spams = mode . data [" spams" ] 
announced = mode . data [" announced " ] 
life = tstamp — announced 
#if {life > 1000): 

# print "\tWithdraw:", prefix, "announced:", announced, "lifetime: 
if (len(spams) > 0 && life <=90000) or se 1 f . verbose : 
print " \ tWithdraw : " , prefix, "announced:", announced, 
"withdrawn:", tstamp, "lifetime:", life 
print " \ tWith^associated^spam^messages : " 
for spamid in spams: 

print "\t\t", s e 1 f . spamdict [ spamid ] 
self, rtree . delete(prefix) 
else : 

print "Unknown^BGP^message^type . " 
sys.exit(0) 

def grabNextFile ( self ): 
if len(self. files) <= 0: 

if se 1 f . verbose : print " Reached^end^of^update^ f i 1 e s . " 
return False 

nextfile = se 1 f . f i 1 e s . pop (0) 

self . current_file = self.dumpdir + ’/’ + nextfile 
if s e 1 f . verbose : print "Opening:", self . current_f i le 

self.proc = subprocess.Popen([self.bgpdump, m" , self . current_fi le ], 
stdout = subprocess.PIPE , stderr = subprocess.PIPE) 
return True 

def grabOneUpdate ( self ): 

(tstamp, msgtype, prefix, originAS) = (None, None, None, None) 
while (True): 

line = self . proc . stdout . readline (). strip () 
if len(line) -= 0: 

if self. grabNextFile () == False: 


/ ife 
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break 

line = self . proc . stdout . readline (). strip () 

# for some reason , there are corrupt records in the MRT 
pureascii = * ’ .join ([i if ord (i) < 128 else ’’ for i in line]) 
if len(line) != len ( pureascii ): 

print " Line^was^corrupt : " , line 
continue 

# at this point, we have ascii. check number of tokens. 
tokens = line . split ( * I ’) 

if (len(tokens) != 15) and (len(tokens) != 6): 

print "Bad^number^of^tokens : " , len (tokens), "Line:", line 

continue 

(tstamp, msgtype , prefix) = (tokens [1], tokens [2] , tokens [5]) 
tstamp = int(tstamp) 
originAS = 0 

# invariant: at this point , we have a good MRT record. 
if msgtype == ’A’ : 

try : 

aspathstr = tokens [6]. split () 

origin = aspathstr [ —1]. translate (None , ’{}’) 

# deal with AS sets 
if origin . find (’,’) != —1: 
as_set = origin . split (’,’) 
origin - as_set[ —1] 
originAS = int(origin) 
except Exception , e: 

print "Error:", e, "Line:", line 
continue 

# all good at this point. break out of while and return 

break 

return (tstamp, msgtype, prefix, originAS) 

def addSpam ( self , tstamp, addr , spamid): 
mode = self . rtree . search_best ( addr) 
if mode : 

rnode.data[" spams " ]. append ( spamid ) 

print " ^Adding^SpamlD : " , spamid, " to^prefix : " , mode . prefix 
self . spamdict [spamid] = (tstamp, addr) 
else : 

print ^Couldn ’ t^find^route^to : " , addr 

if_name_== "_main_": 

abusive6=’/ home / research / abusive6 / ’ 

bgp = BGPUpdates ( dumpdir=abusive6+ ’ bz2dump ’ , verbose = True ) 
csvfile = open (’blah . csv ’ , ’rU’) 
spamreader=csv . reader ( c svfile , delimiter^,*) 
spamid = 0 

for row in spamreader: 

(timestring , addr) = row 
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tstamp = int (time . mktime (time . strptime (timestring , "%n/%d/%y l _ J %I))) 

print "^New^spam. ^Timestamp , tstamp, "Addr:", addr 

(a,w) = bgp . processUpdates (tstamp ) 

bgp . addSpam (tstamp , addr, spamid) 

spamid+=l 

print "^Processed", a, "BGP^announcements , " , w, "withdrawals." 
csvfile . close () 
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